arXiv:l502.01856v3 [astro-ph.CO] 28 Mar 2015 


Mon. Not. R. Astron. Soc. 000, 1-5 (2014) Printed 31 March 2015 (MN IATeX style file v2.2) 

PolyChord: nested sampling for cosmology 

W.J. Handley 1,2 *, M.P. Hobson 1 ! & A.N. Lasenby 1,2 ! 

1 Astrophysics Group, Cavendish Laboratory, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK 
2 Kavli Institute for Cosmology, Cambridge, Madingley Road, Cambridge, CB3 0HA, UK 

Received 6 February 2015 


ABSTRACT 

PolyChord is a novel nested sampling algorithm tailored for high dimensional pa¬ 
rameter spaces. In addition, it can fully exploit a hierarchy of parameter speeds such 
as is found in CosmoMC and CAMB. It utilises slice sampling at each iteration 
to sample within the hard likelihood constraint of nested sampling. It can identify 
and evolve separate modes of a posterior semi-independently and is parallelised using 
OPENMPI. PolyChord is available for download at: http: //ccpf orge. cse. rl. ac. 
uk/gf/project/polychord/ 
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1 INTRODUCTION 

Over the past two decades, the quantity and quality of astro- 
physical and cosmological data has increased substantially. 
In response to this, Bayesian methods have been increasingly 
adopted as the standard inference procedure. 

Bayesian inference consists of parameter estimation and 
model comparison. Parameter estimation is generally per¬ 
formed using Markov-Chain Monte-Carlo (MCMC) meth¬ 
ods, such as the Metropolis Hastings (MH) algorithm and 
its variants (Mackay 2003). In order to perform model com¬ 
parison, one must calculate the evidence ; a high-dimensional 
integration of the likelihood over the prior density (Sivia & 
Skilling 2006). MH methods cannot compute this on a usable 
timescale, hindering the use of Bayesian model comparison 
in cosmology and astroparticle physics. 

A contemporary methodology for computing evidences 
and posteriors simultaneously is provided by Nested Sam¬ 
pling (Skilling 2006). This has been successfully imple¬ 
mented in the now widely adopted algorithm MultiNest 
(Feroz & Hobson 2008; Feroz et al. 2009, 2013). Modern cos¬ 
mological likelihoods now involve a large number of parame¬ 
ters, with a hierarchy of speeds. MultiNest struggles with 
high-dimensional parameter spaces, and is unable to take 
advantage of this separation of speeds. PolyChord aim 
to address these issues, providing a means to sample high¬ 
dimensional spaces across a hierarchy of parameter speeds. 

This letter is a brief overview of our algorithm, which is 
now in use in several cosmological applications (Planck Col¬ 
laboration XX 2015). It will be followed in the near future 
by a longer and more extensive paper. 


* wh260@mrao.cam.ac.uk 
f mph@mrao.cam.ac.uk 
j a.n.lasenby@mrao.cam.ac.uk 


PolyChord is available for download from the link at 
the end of the paper. 


2 BAYESIAN INFERENCE 


Given some dataset T>, one may use a model M with param¬ 
eters 9 to calculate a likelihood P(#|D, M) = Cm{9). This 
may be inverted using Bayes’ theorem to find the posterior 
distribution on the parameters: 

P(6\V,M) = V m ( 0)= Cm ^ mW , ( 1 ) 

-C-M 

where P(0|A1) = i ^ m ( 9 ) is our prior degree of belief on the 
values of the parameters and P(T>\M) = Zm is the evidence 
or marginal likelihood, calculated as: 

Z M = j jC. M (0)n M {9)d8. (2) 

One may use the evidence Zm to perform model compar¬ 
ison, comparing Zm to the evidences of other competing 
models {Mi, M 2 , ■ ■ ■}■ Applying Bayes theorem to model 
comparison yields: 


V{Mi\V) 


V(V\Mi)V{Mi) 

P(D) 


Z% 7T i 


( 3 ) 


where P(Mi) = m is the prior probability of a model Mi 
(usually taken as uniform). Since the evidences Zi play a 
key role in Bayesian model comparison, so their calculation 
is vital for this aspect of inference. For a full discussion of 
Bayesian inference, we recommend Sivia & Skilling (2006) 
and part IV of Mackay (2003). 
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3 NESTED SAMPLING 

Skilling (2006) constructed an ingenious method of sampling 
which enables one to estimate the evidence integral (2) effi¬ 
ciently. 

One begins by drawing n “live points” uniformly from 
the prior. At iteration i, the point with the lowest likelihood 
Li is replaced by a new live point drawn uniformly from 
the prior with the constraint that its likelihood L > L,. One 
may estimate the fraction of the prior A'; enclosed by the iso- 
likelihood contour Li, since it will be the lowest volume of n 
volumes drawn uniformly from [0, A,_i]. X , thus satisfies: 

Xi = tiXi- U P{ti) = ntr\ X 0 = l (4) 
We may then calculate the evidence by quadrature: 

2 tz£jCi(Xi~i-Xi). (5) 

i=l 

and the mean and variance of the random variable Z 
may be inferred using equation (4). The posterior mass 
contained within the live points may be estimated as 
Zuve « (£}ii ve A'ij ve , and convergence is reached when Zu ve 
is some small fraction of Z (Keeton 2011). 

In addition, the pool of dead points may be used to 
construct a set of posterior samples for use in parameter 
estimation. For further details on the nested sampling pro¬ 
cedure, see Sivia & Skilling (2006). 


4 SAMPLING WITHIN AN ISO-LIKELIHOOD 

CONTOUR 

The difficult step in nested sampling is sampling from the 
prior subject to the hard likelihood constraint L > Li. The 
prior may be sampled using inverse transform sampling, so 
that sampling effectively occurs in the unit hypercube (Feroz 
et al. 2009). How one samples in this space under the hard 
likelihood constraint is where variations of the algorithm 
differ. 

The MultiNest algorithm samples by using the live 
points to construct a set of intersecting ellipsoids which to¬ 
gether aim to enclose the likelihood contour, and then sam¬ 
ples by rejection sampling within it. Whilst being an excel¬ 
lent algorithm for small number of parameters, any rejection 
sampling algorithm has an exponential scaling with dimen¬ 
sionality. 

Galilean Sampling (Feroz & Skilling 2013; Betancourt 
2011) samples by an MCMC-chain based approach using 
reflection off iso-likelihood contours. This however suffers 
from the need to tune parameters, and the requirement of 
likelihood gradients. 

Diffusive nested sampling (Brewer et al. 2009) is an al¬ 
ternative and promising variation on Skilling’s (2006) algo¬ 
rithm, which utilises MCMC to explore a mixture of nested 
probability distributions. Since it is MCMC based, it scales 
well with dimensionality. In addition, it can deal with multi¬ 
modal and degenerate posteriors, unlike traditional MCMC. 
It does however have multiple tuning parameters. 

We find that Neal’s (2000) slice sampling is well suited 
to the problem of drawing points from within an iso- 
likelihood contour. In the one-dimensional case, he suggests 
the sampling procedure detailed in Figure 1. In the N- 
dimensional case, he suggests multiple methods, the most 



Figure 1. Slice sampling in one dimension. Given a likelihood 
slice £o, a seed point xo and an initial parameter w, slice sampling 
generates a new point x\ within the horizontal region defined by 
L > Ci,. A point x is within the “slice” if C(x) > Co- External 
bounds are first set L < xo < R by expanding a random initial 
bound of width w until they lie outside the slice via the stepping 
out procedure, xi is then sampled uniformly within these bounds. 
If x i is not in the slice, then the bound is contracted down to 
x i, and x i is then drawn again from these new bounds. This 
procedure is guaranteed to generate a new point x i, and satisfies 
detailed balance P(xq\xi) = P(a;i|a;o). Thus, if xo is drawn from 
a uniform distribution within the slice, so is xi. 

simple being to perform one dimensional slice sampling of 
each parameter in turn (in a random order). This method 
has been implemented in nested sampling by Aitken & Ak- 
man (2013) to great effect on some simple examples. How¬ 
ever, like Gibbs and MH sampling, this methodology strug¬ 
gles with degenerate distributions. PolyChord aims to ad¬ 
dress this issue. 


5 THE PolyChord ALGORITHM 

PolyChord implements several novel features compared 
to Aitken & Akman’s (2013) slice-based nested sampling. 
We utilise slice sampling in a manner that uses the infor¬ 
mation in the live points to deal with degeneracies in the 
contour. We also implement a general clustering algorithm 
which identifies and evolves separate modes in the poste¬ 
rior semi-independently. The algorithm is written in FOR- 
TRAN95 and parallelised using OPENMPI. It is optimised 
for the case where the dominant cost is the generation of a 
new live point. This is frequently the case in astrophysical 
applications, either due to high dimensionality, or to costly 
likelihood evaluation. In addition, it has the option of imple¬ 
menting fast-slow parameters, which is extremely effective 
in its combination with CosmoMC. This is termed COSMO- 
Chord, which may be downloaded from the link at the end 
of the paper. 

Our generic TV-dimensional slice-sampling procedure is 
detailed in Figure 2. 

5.1 Dealing with degenerate contours 

To solve degenerate cases, we make a linear transformation 
to “whiten” the space, so that in general the contour has size 
~ 0(1) in every direction. Uniform sampling is preserved 
under affine transformations, so this strategy is valid. 

In order to find such a linear transformation, note that 
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at every step of the algorithm one has some knowledge of 
the dimensions of the contour, provided by the set of live 
points and all of the inter-chain points. We take the covari¬ 
ance matrix of these points as a reasonable descriptor of the 
correlation. The Cholesky decomposition T of the covariance 
matrix E = TT t is a good choice of transformation so that: 

T- 1 x = y (6) 

transforms from x in the unit hypercube to y in the sam¬ 
pling space. Working in the sampling space, a reasonable ap¬ 
proach that mixes well is to start from a random live point 
as the seed, choose a randomly oriented orthonormal basis, 
and sample along these directions in a random order. For 
difficult distributions, one may require multiple iterations of 
this procedure to ensure that the final point is fully decor- 
related from the start point. This is illustrated further in 
Figure 2. Within the sampling space, the initial step size 
may be chosen as w = 1. 

This procedure has the advantage of being dynamically 
adaptive, and requires no tuning parameters. However, this 
“whitening” process is ineffective for pronounced curving 
degeneracies. 


5.2 Clustering 

Multi-modal posteriors are a challenging problem for any 
sampling algorithm. “Perfect” nested sampling (i.e. the en¬ 
tire prior volume enclosed by the iso-likelihood contour is 
sampled uniformly) in theory solves multi-modal problems 
as easily as uni-modal ones. In practice however, there are 
two issues. 

First, one is limited by the resolution of the live points. 
If a given mode is not populated by enough live points, it 
runs the risk of “dying out”. Indeed, a mode may be en¬ 
tirely missed if the algorithm is not sufficiently resolved. In 
many cases, this problem can be alleviated by increasing the 
number of live points. 

Second, and more importantly for PolyChord, the 
sampling procedure may not be appropriate for multi-modal 
problems. We “whiten” the unit hypercube using the co- 
variance matrix of live points. For far-separated modes, the 
covariance matrix will not approximate the dimensions of 
the contours, but instead falsely indicate a high degree of 
correlation. It is therefore essential for our purposes to have 
PolyChord recognise and treat modes appropriately. 

This methodology splits into two distinct parts; (i) 
recognising that clusters are there, and (ii) evolving the clus¬ 
ters semi-independently. 


5.2.1 Recognition of clusters 

Any cluster recognition algorithm can be substituted at this 
point. One must take care that this is not run too often, or 
one runs the risk of adding a large overhead to the calcu¬ 
lation. In practice, checking for clustering every ~ 0(nu ve ) 
iterations is sufficient. We encourage users of PolyChord 
to experiment with their own preferred cluster recognition, 
in addition to that provided and described below. 

It should be noted that the live points of nested sam¬ 
pling are amenable to most cluster recognition algorithms 


for two reasons. First, all clusters should have the same den¬ 
sity of live points in the unit hypercube, since they are uni¬ 
formly sampled. Second, there is no noise (i.e. outside of the 
likelihood contour there will be no live points). Many clus¬ 
tering algorithms struggle when either of these two points 
are not satisfied. 

We therefore choose a relatively simple variant of the k- 
nearest neighbours algorithm to perform cluster recognition. 
If two points are within one another’s ft-nearest neighbours, 
then these two points belong to the same cluster. We iter¬ 
ate k from 2 upwards until the clustering becomes stable 
(the cluster decomposition does not change from one k to 
the next). If sub-clusters are identified, then this process is 
repeated on the new sub-clusters. 

5.2.2 Evolving the clusters semi-independently 

An important novel feature comes from what one does once 
clusters are identified. 

First, when spawning from an existing live point, the 
whitening procedure is now defined by the covariance matrix 
of the live points within that cluster. This solves the issue 
detailed above. 

Second, PolyChord would naively spawn live points 
into a mode with a probability proportional to the number 
of live points in that mode. In fact, what it should be doing 
is to spawn in proportion to the volume fraction of that 
mode. These should be the same, but the difference between 
these two ratios will exhibit random-walk like behaviour, 
and can lead to biases in evidence calculations, or worse, 
cluster death. Instead, one can keep track of an estimate of 
the volume in each cluster, using an approached based on 
(4), and choose the mode to spawn into in proportion to 
that estimate. This methodology will be fully documented 
in the more extensive paper. 

Thus, the point to be killed off is still the global lowest- 
likelihood point, but we control the spawning of the new live 
point into clusters by using our estimates of the volumes 
of each cluster. We call this ‘semi-independent’, because it 
retains global information, whilst still treating the clusters 
as separate entities. 

When spawning within a cluster, we determine the clus¬ 
ter assignment of the new point by which cluster it is nearest 
to. It does not matter if clusters are identified too soon; the 
evidence calculation will remain consistent. 


5.3 Parallelisation 

Currently, PolyChord is parallelised by openMPI using a 
master-slave structure. One master process takes the job 
of organising all of the live points, whilst the remaining 
Uprocs — 1 “slave” processes take the job of finding new live 
points. 

When a new live point is required, the master process 
sends a random live point and the Cholesky decomposition 
to a waiting slave. The slave then, after some work, signals 
to the master that it is ready and returns a new live point 
and the inter-chain points to the master. 

A point generated from an iso-likelihood contour Ci 
is usable as a new live point for an iso-likelihood contour 
Cj > Ci, providing it is within both contours. One may keep 
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Figure 2. Slice Sampling in N dimensions. We begin by “whitening” the unit hypercube by making a linear transformation which turns 
a degenerate contour into one with dimensions ~ (9(1) in all directions. This is a linear skew transformation defined by the inverse of the 
Cholesky decomposition of the live points’ covariance matrix. We term this whitened space the sampling space. Starting from a randomly 
chosen live point a?o, we pick a random direction and perform one dimensional slice sampling in that direction (Figure 1), using w = 1 
in the sampling space. This generates a new point x± in ~ G (a few) likelihood evaluations. This process is repeated ~ (9(ridims) times 
to generate a new uniformly sampled point xjsf which is decorrelated from xq. 



^procs/^live 


Figure 3. Parallelisation of PolyChord. The algorithm paral¬ 
lelises nearly linearly, providing that n pr0 cs < ^live- For most 
astronomical applications this is more than sufficient. 


slaves continuously active, and discard any points returned 
which are not usable. The probability of discarding a point 
is proportional to the volume ratio of the two contours, so if 
too many slaves are used, then most will be discarded. The 
parallelisation goes as: 


Speedup (nprocs) = nn V e log 


1 + 


Nprocs 

^live 


(7) 


and is illustrated in Figure 3. 


5.4 Fast-slow parameters and CosmoChord 

In cosmological applications, likelihoods exhibit a hierarchy 
of parameters in terms of calculation speed (Lewis 2013). 


The effect of this is that the likelihoods may be quickly re¬ 
calculated if one changes certain subsets of the parameters. 
In PolyChord it is very easy to exploit such a hierarchy. 
Our transformation to the sampling space is laid out so that 
if parameters are ordered from slow to fast, then this hierar¬ 
chy is automatically exploited: A Cholesky decomposition, 
being a upper-triangular skew transformation, mixes each 
parameter only with faster parameters. Further to this, one 
may use the fast directions to extend the chain length by 
many orders of magnitude. This helps to ensure an even 
mixing of live points. 


6 PolyChord IN ACTION 
6.1 Gaussian Likelihood 

As an example of the efficacy of PolyChord, we compare 
it to MultiNest on a Gaussian likelihood in D dimensions. 
In both cases, convergence is defined as when the posterior 
mass contained in the live points is IIP 2 of the total cal¬ 
culated evidence. We set nii ve = 25 D, so that the evidence 
error remains constant with D. MltltiNest was run in its 
default mode with importance nested sampling and expan¬ 
sion factor e = 0.1. Whilst constant efficiency mode has the 
potential to reduce the number of MultiNest evaluations, 
the low efficiencies required in order to generate accurate 
evidences negate this effect. 

With these settings, PolyChord produces consistent 
evidence and error estimates with an error ~ 0.4 log units 
(Figure 5). Using importance nested sampling, MultiNest 
produces estimates that are within this accuracy. 

Figure 4 shows the number of likelihood evaluations Nc 
required to achieve convergence as a function of dimension¬ 
ality D. 
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Number of dimensions, D 

Figure 4. Comparing PolyChord with MultiNest using a 
Gaussian likelihood for different dimensionalities. PolyChord 
has at worst Nc ~ 0(D 3 ), whereas MultiNest has an expo¬ 
nential scaling that emerges at high dimensions. 



Number of dimensions, D 

Figure 5. Evidence estimates and errors produced by Poly¬ 
Chord for a Gaussian likelihood as a function of dimensionality. 
The dashed line indicates the correct analytic evidence value. 


Even on a simple likelihood such as this, PolyChord 
shows a significant improvement over MultiNest in scal¬ 
ing with dimensionality. PolyChord at worst scales as 
Nc ~ 0(D 3 ), whereas MultiNest has an exponential scal¬ 
ing which emerges in higher dimensions. 

However, we must point out that a good rejection al¬ 
gorithm like MultiNest will always win in low dimensions. 
MultiNest is also extremely effective at navigating pro¬ 
nounced curving degeneracies in low dimensions, whereas 
PolyChord must use very long chains in order to navigate 
such contours. 


6.2 CosmoChord 

PolyChord’s real strength lies in its ability to exploit 
a fast-slow hierarchy common in many cosmological ap¬ 
plications. We have successfully implemented PolyChord 
within CosmoMC, termed CosmoChord. The traditional 
Metropolis-Hastings algorithm is replaced with nested sam¬ 


pling. This implementation is available to download from 
the link at the end of the paper. 

This combination has been effectively implemented in 
multiple cosmological applications in the latest Planck pa¬ 
per describing constraints on inflation (Planck Collaboration 
XX 2015), including application to a 37-parameter recon¬ 
struction problem (4 slow, 19 semi-slow, 14 fast). 


7 CONCLUSIONS 

We have introduced PolyChord, a novel nested sampling 
algorithm tailored for high dimensional parameter spaces. 
It is able to fully exploit a hierarchy of parameter speeds 
such as is found in CosmoMC and CAMB (Lewis & Bridle 
2002; Lewis et al. 2000). It utilises slice sampling at each 
iteration to sample within the hard likelihood constraint of 
nested sampling. It can identify and evolve separate modes 
of a posterior semi-independently and is parallelised using 
OPEnMPI. We demonstrate its efficacy on a toy problem. A 
further more detailed paper will follow imminently. 
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DOWNLOAD LINK 

PolyChord is available for download from: 

http://ccpforge.cse.rl.ac.uk/gf/project/polychord/ 
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